{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Supervised Learning with GCN\n",
    "\n",
    "Graph neural networks (GNNs) combines superiority of both graph analytics and machine learning. \n",
    "GraphScope provides the capability to process learning tasks. In this tutorial, we demostrate \n",
    "how GraphScope trains a model with GCN.\n",
    "\n",
    "The learning task is node classification on a citation network. In this task, the algorithm has \n",
    "to determine the label of the nodes in [Cora](https://linqs.soe.ucsc.edu/data) dataset. \n",
    "The dataset consists of academic publications as the nodes and the citations between them as the links: if publication A cites publication B, then the graph has an edge from A to B. The nodes are classified into one of seven subjects, and our model will learn to predict this subject.\n",
    "\n",
    "In this task, we use Graph Convolution Network (GCN) to train the model. The core of the GCN neural network model is a \"graph convolution\" layer. This layer is similar to a conventional dense layer, augmented by the graph adjacency matrix to use information about a node's connections.\n",
    "\n",
    "This tutorial has the following steps:\n",
    "\n",
    "- Launching learning engine and attaching the loaded graph.\n",
    "- Defining train process with builtin GCN model and config hyperparameters\n",
    "- Training and evaluating\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install graphscope package if you are NOT in the Playground\n",
    "\n",
    "!pip3 install graphscope\n",
    "!pip3 uninstall -y importlib_metadata  # Address an module conflict issue on colab.google. Remove this line if you are not on colab."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import the graphscope module.\n",
    "\n",
    "import graphscope\n",
    "\n",
    "graphscope.set_option(show_log=False)  # enable logging"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load cora dataset\n",
    "\n",
    "from graphscope.dataset import load_cora\n",
    "\n",
    "graph = load_cora()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "Then, we need to define a feature list for training. The training feature list should be seleted from the vertex properties. In this case, we choose all the properties prefix with \"feat_\" as the training features.\n",
    "\n",
    "With the featrue list, next we launch a learning engine with the [graphlearn](https://graphscope.io/docs/reference/session.html#graphscope.Session.graphlearn) method of graphscope. \n",
    "\n",
    "In this case,  we specify the GCN training over \"paper\" nodes and \"cites\" edges.\n",
    "\n",
    "With \"gen_labels\", we split the \"paper\" nodes into three parts, 75% are used as training set, 10% are used for validation and 15% used for testing.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define the features for learning\n",
    "paper_features = []\n",
    "for i in range(1433):\n",
    "    paper_features.append(\"feat_\" + str(i))\n",
    "\n",
    "# launch a learning engine.\n",
    "lg = graphscope.graphlearn(\n",
    "    graph,\n",
    "    nodes=[(\"paper\", paper_features)],\n",
    "    edges=[(\"paper\", \"cites\", \"paper\")],\n",
    "    gen_labels=[\n",
    "        (\"train\", \"paper\", 100, (0, 75)),\n",
    "        (\"val\", \"paper\", 100, (75, 85)),\n",
    "        (\"test\", \"paper\", 100, (85, 100)),\n",
    "    ],\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We use the builtin GCN model to define the training process. You can find more detail about all the builtin learning models on [Graph Learning Model](https://graphscope.io/docs/learning_engine.html#data-model)\n",
    "\n",
    "In the example, we use tensorflow as \"NN\" backend trainer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import graphscope.learning\n",
    "from graphscope.learning.examples import GCN\n",
    "from graphscope.learning.graphlearn.python.model.tf.optimizer import get_tf_optimizer\n",
    "from graphscope.learning.graphlearn.python.model.tf.trainer import LocalTFTrainer\n",
    "\n",
    "# supervised GCN.\n",
    "\n",
    "\n",
    "def train(config, graph):\n",
    "    def model_fn():\n",
    "        return GCN(\n",
    "            graph,\n",
    "            config[\"class_num\"],\n",
    "            config[\"features_num\"],\n",
    "            config[\"batch_size\"],\n",
    "            val_batch_size=config[\"val_batch_size\"],\n",
    "            test_batch_size=config[\"test_batch_size\"],\n",
    "            categorical_attrs_desc=config[\"categorical_attrs_desc\"],\n",
    "            hidden_dim=config[\"hidden_dim\"],\n",
    "            in_drop_rate=config[\"in_drop_rate\"],\n",
    "            neighs_num=config[\"neighs_num\"],\n",
    "            hops_num=config[\"hops_num\"],\n",
    "            node_type=config[\"node_type\"],\n",
    "            edge_type=config[\"edge_type\"],\n",
    "            full_graph_mode=config[\"full_graph_mode\"],\n",
    "        )\n",
    "\n",
    "    graphscope.learning.reset_default_tf_graph()\n",
    "    trainer = LocalTFTrainer(\n",
    "        model_fn,\n",
    "        epoch=config[\"epoch\"],\n",
    "        optimizer=get_tf_optimizer(\n",
    "            config[\"learning_algo\"], config[\"learning_rate\"], config[\"weight_decay\"]\n",
    "        ),\n",
    "    )\n",
    "    trainer.train_and_evaluate()\n",
    "\n",
    "\n",
    "# define hyperparameters\n",
    "config = {\n",
    "    \"class_num\": 7,  # output dimension\n",
    "    \"features_num\": 1433,\n",
    "    \"batch_size\": 140,\n",
    "    \"val_batch_size\": 300,\n",
    "    \"test_batch_size\": 1000,\n",
    "    \"categorical_attrs_desc\": \"\",\n",
    "    \"hidden_dim\": 128,\n",
    "    \"in_drop_rate\": 0.5,\n",
    "    \"hops_num\": 2,\n",
    "    \"neighs_num\": [5, 5],\n",
    "    \"full_graph_mode\": False,\n",
    "    \"agg_type\": \"gcn\",  # mean, sum\n",
    "    \"learning_algo\": \"adam\",\n",
    "    \"learning_rate\": 0.01,\n",
    "    \"weight_decay\": 0.0005,\n",
    "    \"epoch\": 5,\n",
    "    \"node_type\": \"paper\",\n",
    "    \"edge_type\": \"cites\",\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After define training process and hyperparameters,\n",
    "\n",
    "Now we can start the traning process with learning engine \"\" and the hyperparameters configurations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train(config, lg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
